Clustering Geometric Data Streams

نویسندگان

  • Jiřı́ Skála
  • Ivana Kolingerová
چکیده

Using recent knowledge in data stream clustering we present a modified approach to the facility location problem in the context of geometric data streams. We give insight to the existing algorithm from a less mathematical point of view, focusing on understanding and practical use, namely by computer graphics experts. We propose a modification of the original data stream k-median clustering to solve facility location which is the case when we a priori do not know the number of clusters in the input data. Like the original, the modified version is capable of processing millions of points while using rather small amount of memory. Based on our experiments with clustering geometric data we present suggestions on how to set processing parameters. We also describe how the algorithm handles various distributions of input data within the stream. These findings may be applied back to the original algorithm. CR Categories: I.5.3 [Computing Methodologies]: Pattern Recognition—Clustering; I.3.5 [ComputingMethodologies]: Computer Graphics—Computational Geometry and Object Modeling

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed Dual Cluster Algorithm Based on Grid for Sensor Streams

In practical applications, Wireless Sensor Networks generate massive data streams with the dual attributes in geography and optimization domain. Energy source of sensor nodes in WSN is usually limited; Data streams transmission is known to be the largest consumer of energy in WSN. Therefore, reduce the total data transmission and maximizing energy efficiency is the major challenge in WSN. In ad...

متن کامل

Sensitivity Sampling Over Dynamic Geometric Data Streams with Applications to k-Clustering

Sensitivity based sampling is crucial for constructing nearly-optimal coreset for k-means / median clustering. In this paper, we provide a novel data structure that enables sensitivity sampling over a dynamic data stream, where points from a high dimensional discrete Euclidean space can be either inserted or deleted. Based on this data structure, we provide a one-pass coreset construction for k...

متن کامل

Geometric Data Perturbation Techniques in Privacy Preserving On Data Stream Mining

Data mining is the information technology that extracts valuable knowledge from large amounts of data. Due to the emergence of data streams as a new type of data, data stream mining has recently become a very important and popular research issue. Privacy preservation issue of data streams mining is very important issue, in this dissertation work, an approach based on Geometric data perturbation...

متن کامل

Monitoring Distributed Data Streams through Node Clustering

Monitoring data streams in a distributed system is a challenging problem with profound applications. The task of feature selection (e.g., by monitoring the information gain of various features) is an example of an application that requires special techniques to avoid a very high communication overhead when performed using straightforward centralized algorithms. Motivated by recent contributions...

متن کامل

Geometric Monitoring of Heterogeneous Streams (Long Version, with Proofs of the Theorems)

Interest in stream monitoring is shifting toward the distributed case. In many applications the data is high volume, dynamic, and distributed, making it infeasible to collect the distinct streams to a central node for processing. Often, the monitoring problem consists of determining whether the value of a global function, defined on the union of all streams, crossed a certain threshold. We wish...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007